There are too many useless and redundant variables in this raw dataset. In this project, we first select the useful columns as below.
BATHRM - Number of Full Bathrooms
HF_BATHRM - Number of Half Bathrooms (no bathtub or shower)
HEAT - Heating
AC - Cooling
NUM_UNITS - Number of Units
ROOMS - Number of Rooms
BEDRM - Number of Bedrooms
AYB - The earliest time the main portion of the building was built
YR_RMDL - Year structure was remodeled
EYB - The year an improvement was built more recent than actual year built
STORIES - Number of stories in primary dwelling
SALEDATE - Date of most recent sale
PRICE - Price of most recent sale
QUALIFIED - Qualified
SALE_NUM - Sale Number
GBA - Gross building area in square feet
BLDG_NUM - Building Number on Property
STYLE - Style
STRUCT - Structure
GRADE - Grade
CNDTN - Condition
EXTWALL - Extrerior wall
ROOF - Roof type
INTWALL - Interior wall
KITCHEN - SNumber of kitchens
FIREPLACES - Number of fireplaces
LANDARE - ALand area of property in square feet
WARD - Ward (District is divided into eight wards, each with approximately 75,000 residents)
Secondly, apart from the NA value, there are also some “No Data” strings in this dataset (e.g. in the columns HEAT and GRADE as printed). After dropping all these NA values, we finally get the cleaned dataset with 33165 rows and 27 columns.
## [1] "HEAT"
## [1] "GRADE"
Apart from the price, the condition of a property is always a determinant factor we will concern when we plan to buy a property. In this section, our purpose is going to predict the condition of a particular property without seeing the photos of it. Here, we can hardly get the specific features of a property that show the condition of this property. In other words, we cannot find the causality between some features and the condition of a property. Thus, in this report, we just try to predict a property tends to be in better condition through some general features which can be easily collected.
Analogous to the problem in the ranking method (e.g. customer reviews in Amazon), the assessment of the condition is varied from different individuals. Hence, there is no uniform rule to classify properties into differernt levels of condition. Here is a detailed explanation of condition from Marshall & Swift Condition Assessment (page E-6).
Excellent Condition - All items that can normally be repaired or refinished have recently been corrected, such as new roofing, paint, furance overhaul, state of the art components, etc. With no functional inadequacies of any consequence and all major short-lived components in like-new condition, the overall effective age has been substantially reduced upon complete revitilization of the structure regardless of the actual chronological age.
Very Good Condition - All items well maintained, many having been overhauled and repaired as they’ve showed signs of wear, increasing the life expectancy and lowering the effective age with little deterioration or obsolesence evident with a high degree of utility.
Good Condition - No obvious maintenance required but neither is everything new. Appearance and utility are above the standard and the overall effective age will be lower than the typical property.
Average Condition - Some evidence of deferred maintenance and normal obsolescence with age in that a few minor repairs are needed along with some refinishing. But with all major components still functional and contributing toward an extended life expectancy, effective age and utility is standard for like properties of its class and usage.
Fair Condition (Badly worn) - Much repair needed. Many items need refinishing or overhauling, deferred maintenance obvious, inadequate building utility and services all shortening the life expectancy and increasing the effective age.
Poor Condition (Worn Out) - Repair and overall needed on painted surfaces, roofing, plumbing, heating, numerous functional inadequacies, substandard utilities etc. (found only in extraordinary circumstances). Excessive deferred maintenance and abuse, limited value-in-use, approaching abandonment or major reconstruction, reuse or change in occupancy is imminent. Effective age is near the end of the scale regardless of the actual chronological age.
From the distribution of the number of properties with respect to different conditions, it looks like a lanky normal distribution, which is reasonable. Over 99% of properties are in “Average”, “Good”, and “Very Good” condition. Therefore, it may cause some problems (will discuss later) to predict the conditon of other three levels, which is “Poor”, “Fair”, and “Excellent.”
##
## Poor Fair Average Good Very Good Excellent
## 11 87 9038 19661 4277 91
For simplicity, we first try to distinguish whether a property is above or below average condition. In other words, we trivially split the condition into two levels, “<= Average” (including “Poor”, “Fair”, “Average”) and “> Average” (including “Good”, “Very Good”, “Excellent”).
##
## Poor Fair Average Good Very Good Excellent
## 11 87 9038 19661 4277 91
##
## <= Average > Average
## 9136 24029
As the condition grouped into 2 levels, we can apply the logistic regression to solve this binomial prediction problem. However, unfortunately, it does not select out a small group of variables when the lambda is within 1 standard error (over 48 features).
## Loaded glmnet 2.0-16
Thus, we try to choose the model with 8 features in 5 standard error away the best model. Athough selecting a model outside 1 standard error will lead to somewhat bias, it performs well in the prediction of test data. As the prediction accuracy in the best model is 80.88%, this simplified model has a pretty good accuracy of 79.89%.
## [1] 0.7988827
As for the confusion matrix of the prediction result, this model performs better in predicting a property with above average condition. Here, in the test data, the accuracy is 80.9% when a property is predicted as above average condition. Besides, 94.5% above-average properties are correctly predicted in the test data.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 11277
##
##
## | observed.classes
## predicted.classes | 0 | 1 | Row Total |
## ------------------|-----------|-----------|-----------|
## 0 | 1297 | 446 | 1743 |
## | 0.744 | 0.256 | 0.155 |
## | 0.416 | 0.055 | |
## | 0.115 | 0.040 | |
## ------------------|-----------|-----------|-----------|
## 1 | 1822 | 7712 | 9534 |
## | 0.191 | 0.809 | 0.845 |
## | 0.584 | 0.945 | |
## | 0.162 | 0.684 | |
## ------------------|-----------|-----------|-----------|
## Column Total | 3119 | 8158 | 11277 |
## | 0.277 | 0.723 | |
## ------------------|-----------|-----------|-----------|
##
##
LASSO selects out 8 variables (“HEAT”,“AC”,“AYB”,“YR_RMDL”,“EYB”,“PRICE”,“QUALIFIED”,“SALE_NUM”), then we take the 2-round feature selection through the best subset GLM. Then, “HEAT” is also moved out. In fact, “HEAT” does not contribute so much in the prediction.
## [1] "BATHRM" "HF_BATHRM" "HEAT" "AC" "NUM_UNITS"
## [6] "ROOMS" "BEDRM" "AYB" "YR_RMDL" "EYB"
## [11] "STORIES" "PRICE" "QUALIFIED" "SALE_NUM" "GBA"
## [16] "BLDG_NUM" "STYLE" "STRUCT" "GRADE" "CNDTN"
## [21] "EXTWALL" "ROOF" "INTWALL" "KITCHENS" "FIREPLACES"
## [26] "LANDAREA" "WARD"
## Morgan-Tatar search since family is non-gaussian.
## Note: factors present with more than 2 levels.
## HEAT AC AYB YR_RMDL
## Mode :logical Mode:logical Mode:logical Mode:logical
## FALSE:5 TRUE:5 TRUE:5 TRUE:5
##
##
##
##
## EYB PRICE QUALIFIED SALE_NUM
## Mode:logical Mode :logical Mode :logical Mode :logical
## TRUE:5 FALSE:2 FALSE:1 FALSE:2
## TRUE :3 TRUE :4 TRUE :3
##
##
##
## Criterion
## Min. :890.4
## 1st Qu.:893.2
## Median :894.2
## Mean :895.1
## 3rd Qu.:897.8
## Max. :899.9
##
## Call: glm(formula = y ~ ., family = family, data = Xi, weights = weights)
##
## Coefficients:
## (Intercept) ACY AYB YR_RMDL EYB
## 0.6377 0.9609 -0.3714 0.7846 0.7767
## PRICE QUALIFIEDU SALE_NUM
## 0.3674 -0.6431 0.2364
##
## Degrees of Freedom: 999 Total (i.e. Null); 992 Residual
## Null Deviance: 1180
## Residual Deviance: 876.4 AIC: 892.4
After the feature selection, we get a simple GLM to predict the condition (above or below average). This model performs quite well in ROC curve where AUC is greater than 0.8.
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following object is masked from 'package:gmodels':
##
## ci
## The following object is masked from 'package:glmnet':
##
## auc
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
## Area under the curve: 0.8234
As for the McFadden’s pseudo R^2 value, it has different evaluation criteria comparing wtih the R^2 value. McFadden states “while the R2 index is a more familiar concept to planner who are experienced in OLS, it is not as well behaved as the rho-squared measure, for ML estimation. Those unfamiliar with rho-squared should be forewarned that its values tend to be considerably lower than those of the R2 index…For example, values of 0.2 to 0.4 for rho-squared represent EXCELLENT fit.” If we get such value of a GLM from 0.2 to 0.4, it indicates this model can explain most of the data.
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
## llh llhNull G2 McFadden r2ML
## -438.2115623 -590.0975621 303.7719997 0.2573913 0.2619709
## r2CU
## 0.3781437
We’ve solved the two-level prediction. Now we keep attack the 6-level prediction. Using classification tree is a better way to predict a categorical variable with multi-levels.
We apply the selected variables in the previous feature selection result to this decision tree. Here, the tree only uses two variables, “EYB” and “PRICE.” From this simple prediction model, it shows an improvement after 1964 and the price over 2.4 million means the property tends to be better. However, there are two defects of this model. First, the total prediction accuracy is only 67.3%. Also, in the confusion matrix, it performs not well in the prediction of “Very Good” condition, which is only 51.1%. Second, this model misses to predict 3 minor levels, which are “Poor”, “Fair”, and “Excellent.”
## Warning: Bad 'data' field in model 'call' (expected a data.frame or a matrix).
## To silence this warning:
## Call rpart.plot with roundint=FALSE,
## or rebuild the rpart model with model=TRUE.
## [1] 0.67314
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 11277
##
##
## | tree.observed
## tree.predicted | Poor | Fair | Average | Good | Very Good | Excellent | Row Total |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Average | 0 | 15 | 1479 | 569 | 10 | 0 | 2073 |
## | 0.000 | 0.007 | 0.713 | 0.274 | 0.005 | 0.000 | 0.184 |
## | 0.000 | 0.577 | 0.478 | 0.085 | 0.007 | 0.000 | |
## | 0.000 | 0.001 | 0.131 | 0.050 | 0.001 | 0.000 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Good | 1 | 11 | 1612 | 6017 | 1360 | 17 | 9018 |
## | 0.000 | 0.001 | 0.179 | 0.667 | 0.151 | 0.002 | 0.800 |
## | 1.000 | 0.423 | 0.521 | 0.903 | 0.928 | 0.515 | |
## | 0.000 | 0.001 | 0.143 | 0.534 | 0.121 | 0.002 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Very Good | 0 | 0 | 1 | 74 | 95 | 16 | 186 |
## | 0.000 | 0.000 | 0.005 | 0.398 | 0.511 | 0.086 | 0.016 |
## | 0.000 | 0.000 | 0.000 | 0.011 | 0.065 | 0.485 | |
## | 0.000 | 0.000 | 0.000 | 0.007 | 0.008 | 0.001 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 1 | 26 | 3092 | 6660 | 1465 | 33 | 11277 |
## | 0.000 | 0.002 | 0.274 | 0.591 | 0.130 | 0.003 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##
##
To improve the prediction accuracy of the single tree model, we try to apply the random forest algorithm. The prediction accuracy works better in this improved model, but it still misses 3 minor levels.
##
##
## Cell Contents
## |-------------------------|
## | N |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 11277
##
##
## | tree.observed
## tree.predicted | Poor | Fair | Average | Good | Very Good | Excellent | Row Total |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Average | 1 | 22 | 1612 | 601 | 9 | 0 | 2245 |
## | 0.000 | 0.010 | 0.718 | 0.268 | 0.004 | 0.000 | 0.199 |
## | 1.000 | 0.846 | 0.521 | 0.090 | 0.006 | 0.000 | |
## | 0.000 | 0.002 | 0.143 | 0.053 | 0.001 | 0.000 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Good | 0 | 4 | 1479 | 5961 | 1164 | 16 | 8624 |
## | 0.000 | 0.000 | 0.171 | 0.691 | 0.135 | 0.002 | 0.765 |
## | 0.000 | 0.154 | 0.478 | 0.895 | 0.795 | 0.485 | |
## | 0.000 | 0.000 | 0.131 | 0.529 | 0.103 | 0.001 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Very Good | 0 | 0 | 1 | 98 | 292 | 17 | 408 |
## | 0.000 | 0.000 | 0.002 | 0.240 | 0.716 | 0.042 | 0.036 |
## | 0.000 | 0.000 | 0.000 | 0.015 | 0.199 | 0.515 | |
## | 0.000 | 0.000 | 0.000 | 0.009 | 0.026 | 0.002 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## Column Total | 1 | 26 | 3092 | 6660 | 1465 | 33 | 11277 |
## | 0.000 | 0.002 | 0.274 | 0.591 | 0.130 | 0.003 | |
## ---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##
##
## [1] 0.6974373
Because the criteria of the decision tree is information gain (Shannon Entropy), it tends to make a decision close to the majority. Here, most of data are condition of good, very good, and average. Thus, this tree will not tend to make a decision on condition of excellent, poor, and fair.
I have tried to sovle this “discrimination” problem, but still in vain. Therefore, we need a model can equally treat all levels in a categorical variable although the training data for each level is not in same size.
In this section, we want to use time series analysis to find some patterns of the price from the history data and make a prediction. We select SALTEDATE, and use the mean value of properties’ price from the past years as the variables
## [1] "BATHRM" "HF_BATHRM" "HEAT" "AC" "NUM_UNITS"
## [6] "ROOMS" "BEDRM" "AYB" "YR_RMDL" "EYB"
## [11] "STORIES" "SALEDATE" "PRICE" "QUALIFIED" "SALE_NUM"
## [16] "GBA" "BLDG_NUM" "STYLE" "STRUCT" "GRADE"
## [21] "CNDTN" "EXTWALL" "ROOF" "INTWALL" "KITCHENS"
## [26] "FIREPLACES" "LANDAREA" "WARD"
Now we need to make a time series object. Let’s set the frequence-12 for 12 months, starts at 1992 and increases in single increments:
its
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -30436.3 -9113.5 -2642.8 135.1 12406.2 42514.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9471 0.9869 0.9997 1.0002 1.0181 1.0803
## Jan Feb Mar Apr May Jun
## 1992 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1993 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1994 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1995 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1996 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1997 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1998 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 1999 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2000 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2001 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2002 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2003 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2004 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2005 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2006 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2007 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2008 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2009 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2010 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2011 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2012 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2013 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2014 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2015 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2016 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2017 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## 2018 -5138.842 -2642.783 -30436.308 -3799.698 12406.175 30207.216
## Jul Aug Sep Oct Nov Dec
## 1992 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1993 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1994 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1995 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1996 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1997 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1998 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 1999 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2000 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2001 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2002 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2003 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2004 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2005 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2006 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2007 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2008 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2009 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2010 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2011 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2012 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2013 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2014 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2015 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2016 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2017 42514.012 9446.516 -28633.102 -21540.550 -9113.446 6730.810
## 2018 42514.012
## Jan Feb Mar Apr May Jun Jul
## 1992 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1993 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1994 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1995 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1996 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1997 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1998 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 1999 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2000 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2001 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2002 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2003 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2004 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2005 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2006 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2007 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2008 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2009 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2010 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2011 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2012 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2013 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2014 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2015 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2016 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2017 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## 2018 0.9996520 0.9944291 0.9516181 0.9895590 1.0138775 1.0480446 1.0803478
## Aug Sep Oct Nov Dec
## 1992 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1993 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1994 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1995 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1996 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1997 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1998 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 1999 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2000 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2001 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2002 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2003 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2004 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2005 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2006 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2007 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2008 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2009 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2010 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2011 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2012 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2013 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2014 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2015 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2016 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2017 1.0134904 0.9526006 0.9470513 0.9869390 1.0223905
## 2018
## [1] -30436.31
## [1] 42514.01
We can see the min negative adjusted seasonal component is March and the max is July. Then we use HoltWinters function to olt-Winters function to smooth out the data.
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = ts_price)
##
## Smoothing parameters:
## alpha: 0.1489657
## beta : 0.03870238
## gamma: 0.2672223
##
## Coefficients:
## [,1]
## a 866753.64592
## b 1538.57592
## s1 27313.52461
## s2 -16352.23455
## s3 -11329.32159
## s4 -9761.26185
## s5 -54.01588
## s6 8977.73667
## s7 -5795.84472
## s8 -51303.78093
## s9 15816.28432
## s10 51094.08402
## s11 70395.84302
## s12 54358.39314
## [1] 1.414129e+12
Since the SSE value is too high and the two lines seems fairly inconsistent, the time series anlaysis is not quite a good fit for the price.
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = ts_price)
##
## Smoothing parameters:
## alpha: 0.1489657
## beta : 0.03870238
## gamma: 0.2672223
##
## Coefficients:
## [,1]
## a 866753.64592
## b 1538.57592
## s1 27313.52461
## s2 -16352.23455
## s3 -11329.32159
## s4 -9761.26185
## s5 -54.01588
## s6 8977.73667
## s7 -5795.84472
## s8 -51303.78093
## s9 15816.28432
## s10 51094.08402
## s11 70395.84302
## s12 54358.39314
Now let’s change our object to the sales volume .
It seems that there’s some seasonality. Lets try to Use additive model to decompose the dataset and quantify seasonal compenent.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -33.87399 -7.95399 -3.06079 -0.02138 11.92059 25.80934
## Jan Feb Mar Apr May
## 1992 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1993 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1994 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1995 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1996 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1997 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1998 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 1999 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2000 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2001 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2002 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2003 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2004 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2005 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2006 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2007 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2008 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2009 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2010 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2011 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2012 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2013 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2014 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2015 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2016 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2017 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## 2018 -20.5832212 -33.8739904 -7.9539904 -3.4706571 11.3010096
## Jun Jul Aug Sep Oct
## 1992 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1993 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1994 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1995 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1996 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1997 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1998 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 1999 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2000 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2001 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2002 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2003 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2004 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2005 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2006 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2007 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2008 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2009 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2010 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2011 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2012 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2013 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2014 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2015 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2016 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2017 25.8093429 21.9504327 12.5401763 -3.0607853 0.1779968
## 2018 25.8093429 21.9504327
## Nov Dec
## 1992 -6.6649519 3.8286378
## 1993 -6.6649519 3.8286378
## 1994 -6.6649519 3.8286378
## 1995 -6.6649519 3.8286378
## 1996 -6.6649519 3.8286378
## 1997 -6.6649519 3.8286378
## 1998 -6.6649519 3.8286378
## 1999 -6.6649519 3.8286378
## 2000 -6.6649519 3.8286378
## 2001 -6.6649519 3.8286378
## 2002 -6.6649519 3.8286378
## 2003 -6.6649519 3.8286378
## 2004 -6.6649519 3.8286378
## 2005 -6.6649519 3.8286378
## 2006 -6.6649519 3.8286378
## 2007 -6.6649519 3.8286378
## 2008 -6.6649519 3.8286378
## 2009 -6.6649519 3.8286378
## 2010 -6.6649519 3.8286378
## 2011 -6.6649519 3.8286378
## 2012 -6.6649519 3.8286378
## 2013 -6.6649519 3.8286378
## 2014 -6.6649519 3.8286378
## 2015 -6.6649519 3.8286378
## 2016 -6.6649519 3.8286378
## 2017 -6.6649519 3.8286378
## 2018
## [1] -33.87399
## [1] 25.80934
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = ts_salenum)
##
## Smoothing parameters:
## alpha: 0.2554623
## beta : 0.005922093
## gamma: 0.4265576
##
## Coefficients:
## [,1]
## a 202.2356681
## b 0.4984436
## s1 37.1671414
## s2 8.4132978
## s3 11.7017848
## s4 3.6140869
## s5 8.5262782
## s6 -35.8421207
## s7 -64.5330057
## s8 7.6837678
## s9 15.0345282
## s10 66.2757638
## s11 66.0269946
## s12 -6.0070598
## [1] 132759.8
The sum of squared errors of predication (SSE) is too big, which indicates the time series forecast does not fit well. we can plot the original values and the forecasting values on one chart, black is the original and red is the predicted values.
But we are happy that the forecast for recent years seems fit well. So lets make a specific time series forecast for the recent 6 years.
From the plot, we can see there seems to be a some clear seasonality that are consistent over time, so we could do those procedures again.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -80.59097 -16.21597 -2.48681 0.00275 34.80486 61.34653
## Jan Feb Mar Apr May
## 2013 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## 2014 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## 2015 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## 2016 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## 2017 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## 2018 -52.7118056 -80.5909722 -16.2159722 -2.4868056 34.8048611
## Jun Jul Aug Sep Oct
## 2013 61.3465278 56.0381944 19.3798611 -8.3201389 -0.1784722
## 2014 61.3465278 56.0381944 19.3798611 -8.3201389 -0.1784722
## 2015 61.3465278 56.0381944 19.3798611 -8.3201389 -0.1784722
## 2016 61.3465278 56.0381944 19.3798611 -8.3201389 -0.1784722
## 2017 61.3465278 56.0381944 19.3798611 -8.3201389 -0.1784722
## 2018 61.3465278 56.0381944
## Nov Dec
## 2013 -11.6368056 0.5715278
## 2014 -11.6368056 0.5715278
## 2015 -11.6368056 0.5715278
## 2016 -11.6368056 0.5715278
## 2017 -11.6368056 0.5715278
## 2018
## [1] -80.59097
## [1] 61.34653
We can see the min negative adjusted seasonal component is Febuary and the max positive adjusted seasonal component is June. Then we still use HoltWinters() function to smooth out our data and make a forecast.
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = ts_salenum_2013)
##
## Smoothing parameters:
## alpha: 0.2047107
## beta : 0
## gamma: 0.6408291
##
## Coefficients:
## [,1]
## a 222.0120899
## b 0.7565559
## s1 32.2308793
## s2 4.1331619
## s3 9.0735874
## s4 -0.4566040
## s5 -4.0801846
## s6 -42.9989697
## s7 -82.0075402
## s8 -4.6304142
## s9 2.3630781
## s10 63.9622132
## s11 53.1728087
## s12 -55.5462750
alpha=0.2, means the influence weight of recent observations is small. beta=0, means the slope of the trend remains constant throught the whole time series. gamma=0.64, means seasonal partial predictions are based on both the recent observations and hitory observations.
## [1] 83028.09
And the SSE become smaller.As the plot shows, the time series forecast is more consistant with the orignal observations.
Let’s make a prediction for the next 12 months. The next peak value is predicted to be in the middle of 2019 while the valley value is predicted to be at the begining of 2019. Also, there will be a slump after the peak.
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:randomForest':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
## [1] "X.1" "BATHRM" "HF_BATHRM"
## [4] "HEAT" "AC" "NUM_UNITS"
## [7] "ROOMS" "BEDRM" "AYB"
## [10] "YR_RMDL" "EYB" "STORIES"
## [13] "SALEDATE" "PRICE" "QUALIFIED"
## [16] "SALE_NUM" "GBA" "BLDG_NUM"
## [19] "STYLE" "STRUCT" "GRADE"
## [22] "CNDTN" "EXTWALL" "ROOF"
## [25] "INTWALL" "KITCHENS" "FIREPLACES"
## [28] "USECODE" "LANDAREA" "GIS_LAST_MOD_DTTM"
## [31] "SOURCE" "CMPLX_NUM" "LIVING_GBA"
## [34] "FULLADDRESS" "CITY" "STATE"
## [37] "ZIPCODE" "NATIONALGRID" "LATITUDE"
## [40] "LONGITUDE" "ASSESSMENT_NBHD" "ASSESSMENT_SUBNBHD"
## [43] "CENSUS_TRACT" "CENSUS_BLOCK" "WARD"
## [46] "SQUARE" "X" "Y"
## [49] "QUADRANT"
##
## Attaching package: 'class'
## The following objects are masked from 'package:FNN':
##
## knn, knn.cv
## LATITUDELANDAREAYR_RMDLGRADEWARDASSESSMENT_NBHD
## kNN_acc: 0.5080623 k value: 6 kNN_acc: 0.5059077 k value: 7 kNN_acc: 0.5031971 k value: 8 kNN_acc: 0.5043091 k value: 9 kNN_acc: 0.5041701 k value: 10
##
## k 1 2 3 4
## 1 1918 764 434 301
## 2 769 1449 642 298
## 3 519 899 1678 858
## 4 378 461 811 2209
## [1] 14388
## [1] 0.5041701
##
## 3 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] LATITUDE YR_RMDL
##
## Root node error: 3.4086e+17/9955 = 3.424e+13
##
## n= 9955
##
## CP nsplit rel error xerror xstd
## 1 0.29866 0 1.00000 1.00026 0.088120
## 2 0.26259 2 0.40268 0.51575 0.046745
## 3 0.01000 3 0.14009 0.14841 0.027584
##
## 2 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] BATHRM CNDTN GRADE LANDAREA LONGITUDE YR_RMDL
##
## Root node error: 7.9983e+14/8366 = 9.5605e+10
##
## n= 8366
##
## CP nsplit rel error xerror xstd
## 1 0.277166 0 1.00000 1.00020 0.031569
## 2 0.078454 1 0.72283 0.72322 0.026383
## 3 0.023777 2 0.64438 0.64505 0.024473
## 4 0.019313 3 0.62060 0.63149 0.024033
## 5 0.016052 6 0.56266 0.58854 0.018774
## 6 0.015621 7 0.54661 0.57720 0.018583
## 7 0.010000 8 0.53099 0.55108 0.017595
##
## 7 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] ASSESSMENT_NBHD BATHRM CNDTN LANDAREA
##
## Root node error: 1.1972e+15/10145 = 1.1801e+11
##
## n= 10145
##
## CP nsplit rel error xerror xstd
## 1 0.196204 0 1.00000 1.00029 0.030107
## 2 0.072358 1 0.80380 0.80436 0.030167
## 3 0.070408 2 0.73144 0.72248 0.027881
## 4 0.034886 3 0.66103 0.66247 0.022223
## 5 0.033104 4 0.62614 0.62925 0.021655
## 6 0.023320 5 0.59304 0.59604 0.020256
## 7 0.013649 6 0.56972 0.57321 0.019252
## 8 0.010940 7 0.55607 0.56925 0.019017
## 9 0.010491 8 0.54513 0.55795 0.018692
## 10 0.010000 9 0.53464 0.55152 0.018536
##
## 6 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] BATHRM CNDTN LONGITUDE YR_RMDL
##
## Root node error: 3.7421e+14/6375 = 5.87e+10
##
## n= 6375
##
## CP nsplit rel error xerror xstd
## 1 0.165473 0 1.00000 1.00051 0.045743
## 2 0.066390 1 0.83453 0.84301 0.044579
## 3 0.057505 2 0.76814 0.76914 0.043544
## 4 0.024391 3 0.71063 0.72054 0.042316
## 5 0.023740 4 0.68624 0.69404 0.042093
## 6 0.022582 5 0.66250 0.67905 0.042912
## 7 0.012376 6 0.63992 0.64854 0.042302
## 8 0.012240 7 0.62754 0.64014 0.042290
## 9 0.011013 8 0.61530 0.62506 0.042264
## 10 0.010000 9 0.60429 0.62334 0.042775
##
## 4 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] LANDAREA LATITUDE ROOMS YR_RMDL
##
## Root node error: 4.3996e+18/8803 = 4.9979e+14
##
## n= 8803
##
## CP nsplit rel error xerror xstd
## 1 0.170869 0 1.00000 1.00014 0.061531
## 2 0.156451 2 0.65826 0.69704 0.039667
## 3 0.037539 3 0.50181 0.51568 0.033452
## 4 0.013150 4 0.46427 0.47250 0.028343
## 5 0.010288 7 0.42378 0.45806 0.026004
## 6 0.010000 8 0.41349 0.44841 0.024739
##
## 5 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] BATHRM CNDTN LANDAREA LONGITUDE ROOMS YR_RMDL
##
## Root node error: 7.5523e+14/6588 = 1.1464e+11
##
## n= 6588
##
## CP nsplit rel error xerror xstd
## 1 0.159156 0 1.00000 1.00020 0.044150
## 2 0.068590 1 0.84084 0.84201 0.044991
## 3 0.021643 5 0.56649 0.56738 0.030395
## 4 0.019608 6 0.54484 0.56038 0.030055
## 5 0.013844 7 0.52523 0.54321 0.028563
## 6 0.012245 8 0.51139 0.52007 0.026621
## 7 0.010000 9 0.49915 0.50312 0.025468
##
## 8 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] LANDAREA LATITUDE ROOMS YR_RMDL
##
## Root node error: 4.4505e+14/4559 = 9.7621e+10
##
## n= 4559
##
## CP nsplit rel error xerror xstd
## 1 0.058806 0 1.00000 1.00045 0.37075
## 2 0.020213 6 0.64717 0.96786 0.37745
## 3 0.017139 7 0.62695 0.94712 0.37730
## 4 0.015395 8 0.60981 0.93594 0.37733
## 5 0.010000 9 0.59442 0.92424 0.37730
##
## 9 :
## Regression tree:
## rpart(formula = PRICE ~ BATHRM + ROOMS + LANDAREA + LATITUDE +
## LONGITUDE + BATHRM + ROOMS + LANDAREA + FIREPLACES + YR_RMDL +
## GRADE + CNDTN + ASSESSMENT_NBHD, data = ward, method = "anova")
##
## Variables actually used in tree construction:
## [1] CNDTN LANDAREA LATITUDE ROOMS YR_RMDL
##
## Root node error: 1.799e+14/2760 = 6.5181e+10
##
## n= 2760
##
## CP nsplit rel error xerror xstd
## 1 0.047072 0 1.00000 1.00084 0.20906
## 2 0.020020 3 0.85879 0.90771 0.20863
## 3 0.015218 8 0.75468 0.80213 0.20142
## 4 0.013995 9 0.73946 0.79516 0.20131
## 5 0.013834 14 0.66730 0.79304 0.20131
## 6 0.011462 15 0.65347 0.78981 0.20112
## 7 0.010000 16 0.64200 0.74984 0.20154
##
## 3 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.6773805 k value: 6 kNN_acc: 0.6830052 k value: 7 kNN_acc: 0.6870229 k value: 8 kNN_acc: 0.6862194 k value: 9 kNN_acc: 0.6826035 k value: 10
## 2 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.5702677 k value: 6 kNN_acc: 0.5869981 k value: 7 kNN_acc: 0.582696 k value: 8 kNN_acc: 0.5898662 k value: 9 kNN_acc: 0.5927342 k value: 10
## 7 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.5731179 k value: 6 kNN_acc: 0.5813953 k value: 7 kNN_acc: 0.582972 k value: 8 kNN_acc: 0.587702 k value: 9 kNN_acc: 0.5857312 k value: 10
## 6 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.5577164 k value: 6 kNN_acc: 0.5526976 k value: 7 kNN_acc: 0.555207 k value: 8 kNN_acc: 0.5495609 k value: 9 kNN_acc: 0.5451694 k value: 10
## 4 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.6942299 k value: 6 kNN_acc: 0.7005906 k value: 7 kNN_acc: 0.6933212 k value: 8 kNN_acc: 0.6974103 k value: 9 kNN_acc: 0.6969559 k value: 10
## 5 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.5689132 k value: 6 kNN_acc: 0.5731633 k value: 7 kNN_acc: 0.5743777 k value: 8 kNN_acc: 0.5707347 k value: 9 kNN_acc: 0.5798421 k value: 10
## 8 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.727193 k value: 6 kNN_acc: 0.7412281 k value: 7 kNN_acc: 0.7289474 k value: 8 kNN_acc: 0.7324561 k value: 9 kNN_acc: 0.7298246 k value: 10
## 9 : LANDAREAYR_RMDLLATITUDEROOMSCNDTNLONGITUDEGRADEBATHRMASSESSMENT_NBHD kNN_acc: 0.7550725 k value: 6 kNN_acc: 0.7681159 k value: 7 kNN_acc: 0.7536232 k value: 8 kNN_acc: 0.7536232 k value: 9 kNN_acc: 0.7449275 k value: 10